Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario
نویسندگان
چکیده
Often a face has voice. Appearance sometimes strong relationship with one's In this work, we study how can be converted to voice, which is face-based voice conversion. Since there no clean dataset that contains and speech, conversion faces difficult learning low-quality problems caused by background noise or echo. Too much redundant information for face-to-voice also causes synthesis of general style speech. Furthermore, previous work tried disentangle speech bottleneck adjustment. However, it hard decide on the size bottleneck. Therefore, propose bottleneck-free strategy disentanglement. To avoid synthesizing utilize framewise facial embedding. It applied adversarial multi-scale discriminator model achieve better quality. addition, self-attention module added focus content-related features in-the-wild data. Quantitative experiments show our method outperforms work.
منابع مشابه
Speech Analysis – Synthesis Based on the Ptdft for Voice Conversion
Voice conversion problem became very popular in the world. It has applications in many fields, for example in systems that make use of prerecorded speech, such as voice mailboxes or text-to-speech synthesizers based on acoustic unit concatenation. In such cases, voice modification would be a simple and efficient way to create a desired variety of voices while avoiding recording of different spe...
متن کاملGMM-based voice conversion applied to emotional speech synthesis
Voice conversion method is applied to synthesizing emotional speech from standard reading (neutral) speech. Pairs of neutral speech and emotional speech are used for conversion rule training. The conversion adopts GMM (Gaussian Mixture Model) with DFW (Dynamic Frequency Warping). We also adopt STRAIGHT, the high-quality speech analysis-synthesis algorithm. As conversion target emotions, (Hot) a...
متن کاملEmotional Speech Synthesis Based on Improved Codebook Mapping Voice Conversion
This paper presents a spectral transformation method for emotional speech synthesis based on voice conversion framework. Three emotions are studied, including anger, happiness and sadness. For the sake of high naturalness, superior speech quality and emotion expressiveness, our original STASC system is modified by introducing a new feature selection strategy and hierarchical codebook mapping pr...
متن کاملEvaluation of VTLN-based voice conversion for embedded speech synthesis
Recently, we demonstrated that vocal tract length normalization (VTLN) can be applied to voice conversion tasks. In particular, when the conversion algorithm is performed in time domain, this technique is very resource-efficient and, consequently, suitable for embedded applications. In this paper, we use VTLNbased voice conversion as a novel feature of a small footprint speech synthesizer runni...
متن کاملVoice characteristics conversion for HMM-based speech synthesis system
In this paper, we describe an approach to voice characteristics conversion for an HMM-based text-to-speech synthesis system. Since this speech synthesis system uses phoneme HMMs as speech units, voice characteristics conversion is achieved by changing HMM parameters appropriately. To transform the voice characteristics of synthesized speech to the target speaker, we applied MAP/VFS algorithm to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i11.26607